Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add filters

Language
Document Type
Year range
1.
arxiv; 2021.
Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2110.10780v3

ABSTRACT

While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects. The corpora were derived from texts from three different institutions (Mayo Clinic, University of Kentucky, University of Minnesota). The gold standard annotations were tested with a single institution's (Mayo) ruleset. This resulted in performances of 0.876, 0.706, and 0.694 in F-scores for Mayo, Minnesota, and Kentucky test datasets, respectively. The study as a consortium effort of the N3C NLP subgroup demonstrates the feasibility of creating a federated NLP algorithm development and benchmarking platform to enhance multi-institution clinical NLP study and adoption. Although we use COVID-19 as a use case in this effort, our framework is general enough to be applied to other domains of interest in clinical NLP.


Subject(s)
COVID-19
2.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.06.23.21259416

ABSTRACT

Importance: Since late 2019, the novel coronavirus SARS-CoV-2 has given rise to a global pandemic and introduced many health challenges with economic, social, and political consequences. In addition to a complex acute presentation that can affect multiple organ systems, there is mounting evidence of various persistent long-term sequelae. The worldwide scientific community is characterizing a diverse range of seemingly common long-term outcomes associated with SARS-CoV-2 infection, but the underlying assumptions in these studies vary widely making comparisons difficult. Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 infection (PASC or long COVID), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations of long COVID. Observations: We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts of individuals three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to Human Phenotype Ontology (HPO) terms. Conclusions and Relevance: Patients and clinicians often use different terms to describe the same symptom or condition. Addressing the heterogeneous and inconsistent language used to describe the clinical manifestations of long COVID combined with the lack of standardized terminologies for long COVID will provide a necessary foundation for comparison and meta-analysis of different studies. Translating long COVID manifestations into computable HPO terms will improve the analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared or pooled more effectively. Furthermore, mapping lay terminology to HPO for long COVID manifestations will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, which may improve the stratification and thereby diagnosis and treatment of long COVID.


Subject(s)
COVID-19
3.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.03.20.21253896

ABSTRACT

Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. As the worldwide scientific community forges ahead with efforts to characterize a wide range of outcomes associated with SARS-CoV-2 infection, the proliferation of available data has made it clear that formal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.


Subject(s)
COVID-19
4.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.01.12.21249511

ABSTRACT

BackgroundThe majority of U.S. reports of COVID-19 clinical characteristics, disease course, and treatments are from single health systems or focused on one domain. Here we report the creation of the National COVID Cohort Collaborative (N3C), a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative U.S. cohort of COVID-19 cases and controls to date. This multi-center dataset supports robust evidence-based development of predictive and diagnostic tools and informs critical care and policy. Methods and FindingsIn a retrospective cohort study of 1,926,526 patients from 34 medical centers nationwide, we stratified patients using a World Health Organization COVID-19 severity scale and demographics; we then evaluated differences between groups over time using multivariable logistic regression. We established vital signs and laboratory values among COVID-19 patients with different severities, providing the foundation for predictive analytics. The cohort included 174,568 adults with severe acute respiratory syndrome associated with SARS-CoV-2 (PCR >99% or antigen <1%) as well as 1,133,848 adult patients that served as lab-negative controls. Among 32,472 hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March/April 2020 to 8.6% in September/October 2020 (p = 0.002 monthly trend). In a multivariable logistic regression model, age, male sex, liver disease, dementia, African-American and Asian race, and obesity were independently associated with higher clinical severity. To demonstrate the utility of the N3C cohort for analytics, we used machine learning (ML) to predict clinical severity and risk factors over time. Using 64 inputs available on the first hospital day, we predicted a severe clinical course (death, discharge to hospice, invasive ventilation, or extracorporeal membrane oxygenation) using random forest and XGBoost models (AUROC 0.86 and 0.87 respectively) that were stable over time. The most powerful predictors in these models are patient age and widely available vital sign and laboratory values. The established expected trajectories for many vital signs and laboratory values among patients with different clinical severities validates observations from smaller studies, and provides comprehensive insight into COVID-19 characterization in U.S. patients. ConclusionsThis is the first description of an ongoing longitudinal observational study of patients seen in diverse clinical settings and geographical regions and is the largest COVID-19 cohort in the United States. Such data are the foundation for ML models that can be the basis for generalizable clinical decision support tools. The N3C Data Enclave is unique in providing transparent, reproducible, easily shared, versioned, and fully auditable data and analytic provenance for national-scale patient-level EHR data. The N3C is built for intensive ML analyses by academic, industry, and citizen scientists internationally. Many observational correlations can inform trial designs and care guidelines for this new disease.


Subject(s)
Dementia , Ossification of Posterior Longitudinal Ligament , Severe Acute Respiratory Syndrome , Obesity , COVID-19 , Liver Diseases
SELECTION OF CITATIONS
SEARCH DETAIL